Enable paged attention in varlen forward #831

sgrigory · 2024-02-16T16:16:00Z

Paged attention was added recently in 54e80a3, but only exposed through fwd_kvcache. This PR exposes it though varlen_fwd.
@bottler @danthe3rd

Tests:

pytest  tests/test_flash_attn.py -k test_flash_attn_varlen_causal 2>&1 | tee log.txt

collected 420068 items / 417188 deselected / 2880 selected

tests/test_flash_attn.py ............................................... [  1%]
........................................................................ [  4%]
...
........................................................................ [ 99%]
.........................                                                [100%]

=================== 2880 passed, 417188 deselected in 50.32s ===================

rkooo567 · 2024-03-13T13:11:56Z

Is this PR planned to be merged? It'd be very nice to have this feature!

tridao · 2024-03-15T07:48:28Z

Thank you @sgrigory!

rkooo567 · 2024-03-18T06:30:59Z

So when using this feature, am I supposed to pass k_cache and v_cache to k, v, is this correct? (maybe the docstring has to be updated)

rkooo567 · 2024-03-18T06:33:03Z

Also, I assume cu_seqlens_k is just ignored if block table is used?

sgrigory added 2 commits February 16, 2024 05:19

Enable paged attention in varlen forward

ee075dd

Format + fix padding

bbbab72

sgrigory changed the title ~~[DRAFT] Enable paged attention in varlen forward~~ Enable paged attention in varlen forward Feb 16, 2024

sgrigory marked this pull request as ready for review February 16, 2024 18:32

Merge branch 'main' into paged-attention-in-varlen-fwd

4704bad

tridao merged commit 2a15840 into Dao-AILab:main Mar 15, 2024

masahi mentioned this pull request Sep 10, 2024

Is the combination of var-len, paged KV and split KV supported? #1217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable paged attention in varlen forward #831

Enable paged attention in varlen forward #831

sgrigory commented Feb 16, 2024 •

edited

Loading

rkooo567 commented Mar 13, 2024

tridao commented Mar 15, 2024

rkooo567 commented Mar 18, 2024

rkooo567 commented Mar 18, 2024

Enable paged attention in varlen forward #831

Enable paged attention in varlen forward #831

Conversation

sgrigory commented Feb 16, 2024 • edited Loading

rkooo567 commented Mar 13, 2024

tridao commented Mar 15, 2024

rkooo567 commented Mar 18, 2024

rkooo567 commented Mar 18, 2024

sgrigory commented Feb 16, 2024 •

edited

Loading